45 research outputs found
Properties of contact matrices induced by pairwise interactions in proteins
The total conformational energy is assumed to consist of pairwise interaction
energies between atoms or residues, each of which is expressed as a product of
a conformation-dependent function (an element of a contact matrix, C-matrix)
and a sequence-dependent energy parameter (an element of a contact energy
matrix, E-matrix). Such pairwise interactions in proteins force native
C-matrices to be in a relationship as if the interactions are a Go-like
potential [N. Go, Annu. Rev. Biophys. Bioeng. 12. 183 (1983)] for the native
C-matrix, because the lowest bound of the total energy function is equal to the
total energy of the native conformation interacting in a Go-like pairwise
potential. This relationship between C- and E-matrices corresponds to (a) a
parallel relationship between the eigenvectors of the C- and E-matrices and a
linear relationship between their eigenvalues, and (b) a parallel relationship
between a contact number vector and the principal eigenvectors of the C- and
E-matrices; the E-matrix is expanded in a series of eigenspaces with an
additional constant term, which corresponds to a threshold of contact energy
that approximately separates native contacts from non-native ones. These
relationships are confirmed in 182 representatives from each family of the SCOP
database by examining inner products between the principal eigenvector of the
C-matrix, that of the E-matrix evaluated with a statistical contact potential,
and a contact number vector. In addition, the spectral representation of C- and
E-matrices reveals that pairwise residue-residue interactions, which depends
only on the types of interacting amino acids but not on other residues in a
protein, are insufficient and other interactions including residue
connectivities and steric hindrance are needed to make native structures the
unique lowest energy conformations.Comment: Errata in DOI:10.1103/PhysRevE.77.051910 has been corrected in the
present versio
On the optimal contact potential of proteins
We analytically derive the lower bound of the total conformational energy of
a protein structure by assuming that the total conformational energy is well
approximated by the sum of sequence-dependent pairwise contact energies. The
condition for the native structure achieving the lower bound leads to the
contact energy matrix that is a scalar multiple of the native contact matrix,
i.e., the so-called Go potential. We also derive spectral relations between
contact matrix and energy matrix, and approximations related to one-dimensional
protein structures. Implications for protein structure prediction are
discussed.Comment: 5 pages, text onl
Selective Constraints on Amino Acids Estimated by a Mechanistic Codon Substitution Model with Multiple Nucleotide Changes
Empirical substitution matrices represent the average tendencies of
substitutions over various protein families by sacrificing gene-level
resolution. We develop a codon-based model, in which mutational tendencies of
codon, a genetic code, and the strength of selective constraints against amino
acid replacements can be tailored to a given gene. First, selective constraints
averaged over proteins are estimated by maximizing the likelihood of each 1-PAM
matrix of empirical amino acid (JTT, WAG, and LG) and codon (KHG) substitution
matrices. Then, selective constraints specific to given proteins are
approximated as a linear function of those estimated from the empirical
substitution matrices.
Akaike information criterion (AIC) values indicate that a model allowing
multiple nucleotide changes fits the empirical substitution matrices
significantly better. Also, the ML estimates of transition-transversion bias
obtained from these empirical matrices are not so large as previously
estimated. The selective constraints are characteristic of proteins rather than
species. However, their relative strengths among amino acid pairs can be
approximated not to depend very much on protein families but amino acid pairs,
because the present model, in which selective constraints are approximated to
be a linear function of those estimated from the JTT/WAG/LG/KHG matrices, can
provide a good fit to other empirical substitution matrices including cpREV for
chloroplast proteins and mtREV for vertebrate mitochondrial proteins.
The present codon-based model with the ML estimates of selective constraints
and with adjustable mutation rates of nucleotide would be useful as a simple
substitution model in ML and Bayesian inferences of molecular phylogenetic
trees, and enables us to obtain biologically meaningful information at both
nucleotide and amino acid levels from codon and protein sequences.Comment: Table 9 in this article includes corrections for errata in the Table
9 published in 10.1371/journal.pone.0017244. Supporting information is
attached at the end of the article, and a computer-readable dataset of the ML
estimates of selective constraints is available from
10.1371/journal.pone.001724
Inference of Co-Evolving Site Pairs: an Excellent Predictor of Contact Residue Pairs in Protein 3D structures
Residue-residue interactions that fold a protein into a unique
three-dimensional structure and make it play a specific function impose
structural and functional constraints on each residue site. Selective
constraints on residue sites are recorded in amino acid orders in homologous
sequences and also in the evolutionary trace of amino acid substitutions. A
challenge is to extract direct dependences between residue sites by removing
indirect dependences through other residues within a protein or even through
other molecules. Recent attempts of disentangling direct from indirect
dependences of amino acid types between residue positions in multiple sequence
alignments have revealed that the strength of inferred residue pair couplings
is an excellent predictor of residue-residue proximity in folded structures.
Here, we report an alternative attempt of inferring co-evolving site pairs from
concurrent and compensatory substitutions between sites in each branch of a
phylogenetic tree. First, branch lengths of a phylogenetic tree inferred by the
neighbor-joining method are optimized as well as other parameters by maximizing
a likelihood of the tree in a mechanistic codon substitution model. Mean
changes of quantities, which are characteristic of concurrent and compensatory
substitutions, accompanied by substitutions at each site in each branch of the
tree are estimated with the likelihood of each substitution. Partial
correlation coefficients of the characteristic changes along branches between
sites are calculated and used to rank co-evolving site pairs. Accuracy of
contact prediction based on the present co-evolution score is comparable to
that achieved by a maximum entropy model of protein sequences for 15 protein
families taken from the Pfam release 26.0. Besides, this excellent accuracy
indicates that compensatory substitutions are significant in protein evolution.Comment: 17 pages, 4 figures, and 4 tables with supplementary information of 5
figure
Advantages of a Mechanistic Codon Substitution Model for Evolutionary Analysis of Protein-Coding Sequences
A mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the average fixation probability depending on the type of amino acid replacement, has advantages over nucleotide, amino acid, and empirical codon substitution models in evolutionary analysis of protein-coding sequences. It can approximate a wide range of codon substitution processes. If no selection pressure on amino acids is taken into account, it will become equivalent to a nucleotide substitution model. If mutation rates are assumed not to depend on the codon type, then it will become essentially equivalent to an amino acid substitution model. Mutation at the nucleotide level and selection at the amino acid level can be separately evaluated.The present scheme for single nucleotide mutations is equivalent to the general time-reversible model, but multiple nucleotide changes in infinitesimal time are allowed. Selective constraints on the respective types of amino acid replacements are tailored to each gene in a linear function of a given estimate of selective constraints. Their good estimates are those calculated by maximizing the respective likelihoods of empirical amino acid or codon substitution frequency matrices. Akaike and Bayesian information criteria indicate that the present model performs far better than the other substitution models for all five phylogenetic trees of highly-divergent to highly-homologous sequences of chloroplast, mitochondrial, and nuclear genes. It is also shown that multiple nucleotide changes in infinitesimal time are significant in long branches, although they may be caused by compensatory substitutions or other mechanisms. The variation of selective constraint over sites fits the datasets significantly better than variable mutation rates, except for 10 slow-evolving nuclear genes of 10 mammals. An critical finding for phylogenetic analysis is that assuming variable mutation rates over sites lead to the overestimation of branch lengths